Renewable energy sources play an increasingly important role in the global energy mix, as the effort to reduce the environmental impact of energy production increases.
Out of all the renewable energy alternatives, wind energy is one of the most developed technologies worldwide. The U.S Department of Energy has put together a guide to achieving operational efficiency using predictive maintenance practices.
Predictive maintenance uses sensor information and analysis methods to measure and predict degradation and future component capability. The idea behind predictive maintenance is that failure patterns are predictable and if component failure can be predicted accurately and the component is replaced before it fails, the costs of operation and maintenance will be much lower.
The sensors fitted across different machines involved in the process of energy generation collect data related to various environmental factors (temperature, humidity, wind speed, etc.) and additional features related to various parts of the wind turbine (gearbox, tower, blades, break, etc.).
“ReneWind” is a company working on improving the machinery/processes involved in the production of wind energy using machine learning and has collected data of generator failure of wind turbines using sensors. They have shared a ciphered version of the data, as the data collected through sensors is confidential (the type of data collected varies with companies). Data has 40 predictors, 20000 observations in the training set and 5000 in the test set.
The objective is to build various classification models, tune them, and find the best one that will help identify failures so that the generators could be repaired before failing/breaking to reduce the overall maintenance cost. The nature of predictions made by the classification model will translate as follows:
It is given that the cost of repairing a generator is much less than the cost of replacing it, and the cost of inspection is less than the cost of repair.
“1” in the target variables should be considered as “failure” and “0” represents “No failure”.
The data provided is a transformed version of the original data which was collected using sensors.
Both the datasets consist of 40 predictor variables and 1 target variable.
# Installing the libraries with the specified version
!pip install scikeras keras-tuner --quiet --no-deps tensorflow==2.18.0 scikit-learn==1.3.2 matplotlib===3.8.3 seaborn==0.13.2 numpy==1.26.4 pandas==2.2.2 -q --user --no-warn-script-location
Note:
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import os
# import libraries for ML, DPreP, ME, MB, MOpt, Fsel
import joblib
from pathlib import Path
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve, auc
from sklearn.metrics import precision_score, recall_score, ConfusionMatrixDisplay
from sklearn.metrics import accuracy_score, f1_score, RocCurveDisplay
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.linear_model import LogisticRegression
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV
import tensorflow as tf
import keras_tuner as kt
from tensorflow.keras import layers, models, optimizers, callbacks, regularizers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import layers
from tensorflow import keras
from scikeras.wrappers import KerasClassifier
# Classifiers
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier
from xgboost import XGBClassifier
# uncomment and run the following lines for Google Colab
# drive.mount('/content/drive')
from google.colab import files
uploaded = files.upload()
# 1) Load data & quick overview
train_df = pd.read_csv('Train.csv')
test_df = pd.read_csv('Test.csv')
print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)
train_df.head()
test_df.head()
train_df.info()
test_df.info()
train_df.describe()
test_df.describe()
# Check missing values
print("\nMissing values in each column:")
print(train_df.isnull().sum())
print(test_df.isnull().sum())
# Check for duplicate rows
print("\nNumber of duplicate rows:", train_df.duplicated().sum())
print("\nNumber of duplicate rows:", test_df.duplicated().sum())
Training Set:
Only two features, V1 and V2, contain missing values with 18 missing entries each.
All other features and the Target variable are complete with zero missing values.
Test Set:
V1 has 5 missing values and V2 has 6 missing values.
No other features are missing any data.
Training Set: 0 duplicate rows found.
Test Set: 0 duplicate rows found.
# detect target column (common names) otherwise assume last column
candidates = [c for c in train_df.columns if c.lower() in ("target","failure","y")]
if candidates:
TARGET = candidates[0]
else:
TARGET = train_df.columns[-1]
print("Using target column:", TARGET)
print("\nTarget distribution (train):")
print(train_df[TARGET].value_counts())
print(train_df[TARGET].value_counts(normalize=True))
Class 0 (No Failure): 18,890 records (94.45% of the data)
Class 1 (Failure): 1,110 records (5.55% of the data)
The target variable shows a heavily imbalanced distribution, with the non-failure cases (Class 0) being dominant.
The ratio between Class 0 and Class 1 is approximately 17:1, meaning for every 17 normal turbine operations, there is only 1 failure event.
display(train_df.head())
display(train_df.describe().T)
print("\nMissing values per column (train):")
print(train_df.isnull().sum().sort_values(ascending=False).head(20))
V1 → 18 missing values
V2 → 18 missing values
All other features (V3 to V20) have no missing data.
The training dataset has 20,000 rows, so:
The dataset is very clean, with only a tiny fraction (0.09%) of missing values confined to two features (V1 and V2). We can use median imputation for V1 and V2, or simply drop the 18 rows.
This minimal level of missingness will not affect overall data quality, and simple handling techniques like median imputation or dropping those rows will be sufficient.
display(test_df.head())
display(test_df.describe().T)
print("\nMissing values per column (test):")
print(test_df.isnull().sum().sort_values(ascending=False).head(20))
Training set also showed missing values in V1 (18) and V2 (18), which matches the columns seen in the test set.
This suggests a systematic issue with these two features, not random noise.
Using median imputation for V1 and V2 to avoid losing data and maintain consistency between the training and test sets. Median is robust to outliers, which is ideal for continuous variables like these.
target_col = "Target"
plt.figure(figsize=(6,4))
sns.countplot(data=train_df, x=target_col, palette="coolwarm")
plt.title("Target Variable Distribution - Turbine Failures")
plt.xlabel("Target var (0 = No Failure, 1 = Failure)")
plt.ylabel("Count")
plt.show()
# Value counts
target_counts = train_df[target_col].value_counts()
print("Target distribution (counts):", target_counts)
# Print percentage distribution
target_proportions = train_df[target_col].value_counts(normalize=True) * 100
print("\nTarget variable percentage distribution:\n", target_proportions)
The target variable shows a severe class imbalance. The dataset is heavily skewed toward Class 0, with roughly a 17:1 ratio between non-failure and failure cases.
Implications:
# ==============================================================
# Univariate analysis - numerical features
# ==============================================================
# Select numerical columns (excluding Target)
numerical_cols = train_df.select_dtypes(include=['int64', 'float64']).columns.tolist()
numerical_cols = [col for col in numerical_cols if col not in [target_col]]
# Plot histograms
for col in numerical_cols:
plt.figure(figsize=(6,4))
sns.histplot(train_df[col], kde=True, bins=30, color='steelblue')
plt.title(f"Distribution of {col}")
plt.xlabel(col)
plt.ylabel("Frequency")
plt.show()
# Check the data types of each column
print("Data Types of Each Column:\n")
print(train_df.dtypes)
# Separate numerical and categorical columns
numerical_cols = train_df.select_dtypes(include=['int64', 'float64']).columns
categorical_cols = train_df.select_dtypes(include=['object', 'category']).columns
print("\nNumerical Columns:")
print(numerical_cols)
print("\nCategorical Columns:")
print(categorical_cols)
The dataset consists of 41 variables:
40 predictor variables (V1–V40): All are of type float64 → continuous numerical data.
Target variable: Target is of type int64, representing a binary classification label (0 = No Failure, 1 = Failure).
Scaling is Necessary
Features are likely on different scales, so standardization is essential before training.
This prevents dominance of features with larger ranges and ensures stable model convergence, especially for SGD-based optimizers.
Insights:
The clean numerical structure aligns well with the predictive maintenance problem.
The preprocessing workflow can focus entirely on handling missing values, scaling, and class balancing, without additional complexity from categorical data.
This also makes the dataset well-suited for neural networks, decision trees, and ensemble models.
# Summary statistics
print(train_df.describe())
# Histograms for numerical features
train_df.hist(figsize=(20, 15), bins=30)
plt.suptitle("Histograms of Numerical Features")
plt.show()
# Boxplots for numerical features
plt.figure(figsize=(20, 10))
sns.boxplot(data=train_df.drop(columns=['Target']))
plt.title("Boxplots of Numerical Features")
plt.xticks(rotation=90)
plt.show()
The mean of Target is 0.0555, confirming high class imbalance:
Most observations are class 0 (non-failures).
This imbalance will require techniques like class weighting, SMOTE, or balanced metrics (e.g., F1-score).
Many variables (V1 to V40) have a wide range, with large differences between the minimum and maximum values.
Example:
V32: Min = -19.87, Max = 23.63 → very high spread.
V35: Min = -15.34, Max = 15.29.
Such wide ranges require feature scaling, such as StandardScaler or MinMaxScaler, before model training.
For several variables, the mean is near 0, suggesting they might already be centered (possibly PCA-transformed features).
Some variables, like V3 and V35, have much higher means:
V3 Mean = 2.48
V35 Mean = 2.23
Many features have extreme minimum and maximum values compared to the 25th and 75th percentiles:
V32: 25% = -3.42, 75% = 3.76, but Max = 23.63 → strong outliers on the upper side.
V1: Min = -11.87, Max = 15.49 → highly spread.
Positive skewness likely for V3, V35, and V36 since their mean > median.
Negative skewness possible in V6, V7, V40 where mean < median.
These outliers could affect models like Logistic Regression or Neural Networks and should be handled or scaled properly.
Features like V32 (std = 5.50) and V38 (std = 3.94) are highly variable, indicating more dispersion.
Features like V7 (std = 1.76) and V39 (std = 1.75) are relatively stable with less variability.
There are 20,000 total records and 40 independent variables (V1–V40) along with the target variable Target.
The dataset is heavily imbalanced, with most wind turbines functioning normally (Target = 0), and a small percentage indicating failure (Target = 1).
Many features have wide ranges and potential outliers, making scaling essential.
Most features (V1–V40) have a mean close to 0, indicating that they are roughly centered.
Some variables like:
V3 (mean = 2.48),
V35 (mean = 2.23),
V36 (mean = 1.51),
Median (50%) of most features is near 0, reinforcing that the majority of features are symmetric around 0 except for the variables mentioned above.
Missing values are limited to V1 and V2, which can be handled easily.
These insights guide the preprocessing steps before building models.
# Median imputation for V1 and V2
for col in ['V1', 'V2']:
median_value = train_df[col].median()
train_df[col].fillna(median_value, inplace=True)
test_df[col].fillna(median_value, inplace=True)
# Verify no missing values remain
print("Missing values in train:\n", train_df[['V1','V2']].isnull().sum())
print("\nMissing values in test:\n", test_df[['V1','V2']].isnull().sum())
There are no missing values in V1 or V2 after the imputation step.
Both train and test datasets are now clean with respect to missing data.
The data is ready for further preprocessing steps like scaling and model training.
# Check for duplicates
duplicate_count = train_df.duplicated().sum()
print(f"Number of duplicate rows in train set: {duplicate_count}")
# If duplicates exist, drop them
if duplicate_count > 0:
train_df.drop_duplicates(inplace=True)
print("Duplicates removed.")
# Ensure that all columns are numeric (Confirm Data Types)
print(train_df.dtypes)
# ==============================================================
# Correlation with target
# ==============================================================
correlation = train_df.corr()
# Sort by absolute correlation with target
target_corr = correlation[target_col].drop(target_col).sort_values(key=abs, ascending=False)
print("\nTop features correlated with Target:\n")
print(target_corr.head(10))
# Visualize top correlations
plt.figure(figsize=(8,4))
sns.barplot(x=target_corr.head(10).index, y=target_corr.head(10).values, palette="coolwarm")
plt.title("Top 10 Features Correlated with Turbine Failure")
plt.ylabel("Correlation Coefficient")
plt.xlabel("Features")
plt.xticks(rotation=45)
plt.show()
V18 is the most important negatively correlated feature, while V21 and V15 are the most positively correlated with the target.
The correlations are moderate, indicating the need for multivariate modeling techniques.
This analysis helps guide feature selection, dimensionality reduction, and model explainability efforts.
# ==============================================================
# Heatmap of top correlated features
# ==============================================================
top_corr_features = target_corr.head(10).index.tolist() + [target_col]
plt.figure(figsize=(10,8))
sns.heatmap(train_df[top_corr_features].corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap of Top Features")
plt.show()
# Correlation of all variables with Target
correlation_with_target = train_df[top_corr_features].corr()['Target'].sort_values(ascending=False)
print(correlation_with_target)
The target variable shows the strongest positive correlations with V21 (0.256), V15 (0.249), V7 (0.237), and V16 (0.231), suggesting that higher values of these features tend to be associated with higher target values. Conversely, V18 (-0.293), V39 (-0.227), V36 (-0.216), and V3 (-0.214) exhibit negative correlations, indicating that higher values of these features tend to correspond to lower target values. Overall, the correlations are relatively weak (all below ±0.3), implying that no single feature dominates in predicting the target, and a combination of features may be needed for better predictive modeling.
# ==============================================================
# Boxplots - top correlated features vs target
# ==============================================================
for col in target_corr.head(5).index: # Top 5 features
plt.figure(figsize=(6,4))
sns.boxplot(x=target_col, y=col, data=train_df, palette="Set2")
plt.title(f"{col} vs Turbine Failure")
plt.xlabel("Target (0 = No Failure, 1 = Failure)")
plt.ylabel(col)
plt.show()
# ==============================================================
# Pairplot of top correlated features
# ==============================================================
sns.pairplot(train_df[top_corr_features], hue=target_col, palette="husl")
plt.suptitle("Pairplot of Top Correlated Features", y=1.02)
plt.show()
# ==============================================================
# Overlaid histograms for failure vs non-failure
# ==============================================================
for col in target_corr.head(10).index: # Top 3 correlated features
plt.figure(figsize=(6,4))
sns.histplot(data=train_df, x=col, hue=target_col, kde=True, palette="coolwarm", element="step")
plt.title(f"Distribution of {col} by Failure Status")
plt.xlabel(col)
plt.ylabel("Density")
plt.show()
# ==============================================================
# Scatterplot of top correlated features
# ==============================================================
from itertools import combinations
# Create all unique pairs
for x_col, y_col in combinations(top_corr_features, 2):
plt.figure(figsize=(7, 5))
sns.scatterplot(data=train_df, x=x_col, y=y_col, hue=target_col, palette='husl', alpha=0.7)
plt.title(f"{x_col} vs {y_col} by {target_col}")
plt.xlabel(x_col)
plt.ylabel(y_col)
plt.legend(title=target_col)
plt.tight_layout()
plt.show()
Since the target is very imbalanced (5.55% failures):
Accuracy will not be a reliable metric.
We better focus on:
AUC (ROC) – to measure separation between classes.
Recall – to reduce false negatives (missed failures).
Precision – to avoid too many false alarms.
Later, during model building, we will:
Apply class weights in training.
Consider resampling techniques like SMOTE or undersampling.
Optimize based on AUC, Precision, and Recall.
# Check missing values
print(train_df.isnull().sum())
print(test_df.isnull().sum())
# Percentage of missing values
missing_percentage_train = train_df.isnull().mean() * 100
print(missing_percentage_train)
missing_percentage_test = test_df.isnull().mean() * 100
print(missing_percentage_test)
# ==============================================================
# Outlier Detection Using IQR
# ==============================================================
X = train_df.drop(columns=[TARGET]).copy()
y = train_df[TARGET].astype(int).copy()
X_test = test_df.drop(columns=[TARGET]).copy() if TARGET in test_df.columns else test_df.copy()
y_test = test_df[TARGET].astype(int) if TARGET in test_df.columns else None
def detect_outliers_iqr(df, feature):
"""Detect outliers in a feature using IQR rule."""
Q1 = df[feature].quantile(0.25)
Q3 = df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[feature] < lower_bound) | (df[feature] > upper_bound)]
return outliers
# Run for all numerical features
outlier_summary = {}
for col in X.columns:
outliers = detect_outliers_iqr(train_df, col)
outlier_summary[col] = len(outliers)
# Convert to DataFrame for better visualization
outlier_df = pd.DataFrame(list(outlier_summary.items()), columns=['Feature', 'Outlier_Count'])
outlier_df = outlier_df.sort_values(by='Outlier_Count', ascending=False)
print("Top 10 features with most outliers:")
print(outlier_df.head(10))
# Plot top outlier counts
plt.figure(figsize=(12, 6))
sns.barplot(x='Feature', y='Outlier_Count', data=outlier_df.head(10), palette='coolwarm')
plt.xticks(rotation=45)
plt.title("Top 10 Features with Most Outliers")
plt.show()
Feature V34 exhibits the largest count of outliers (803), followed closely by V18 (731) and V15 (513). This indicates that these variables may contain extreme values that could disproportionately influence the neural network's training, potentially affecting model performance or stability. Features such as V7 and V17 have relatively fewer outliers, suggesting they are more stable.
# ==============================================================
# Detect and Handle Outliers Using IQR Method
# ==============================================================
# Make a copy of the training data to avoid modifying the original
df_outlier_handled = train_df.copy()
# List of top features with most outliers from your analysis
top_outlier_features = ['V34', 'V18', 'V15', 'V33', 'V29', 'V35', 'V24', 'V13', 'V17', 'V7']
# Function to handle outliers
def handle_outliers_iqr(df, columns):
"""
Detects and caps outliers using the IQR method.
"""
for col in columns:
# Calculate Q1 and Q3
Q1 = df[col].quantile(0.25)
Q3 = df[col].quantile(0.75)
IQR = Q3 - Q1
# Define lower and upper bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Capping the outliers
df[col] = np.where(df[col] < lower_bound, lower_bound, df[col])
df[col] = np.where(df[col] > upper_bound, upper_bound, df[col])
print(f"Outliers handled for {col} | Lower Bound: {lower_bound:.2f}, Upper Bound: {upper_bound:.2f}")
return df
# Apply the function to the top outlier features
df_outlier_handled = handle_outliers_iqr(df_outlier_handled, top_outlier_features)
print("\n Outliers have been capped for the top features.\n")
The outliers in the top features have been successfully capped within specified lower and upper bounds. For example, V34 values were limited to the range [-7.50, 6.80], while V15 was constrained between [-10.50, 5.73]. This capping reduces the impact of extreme values, ensuring that these features no longer disproportionately influence the neural network’s learning process. By limiting the range of these variables, the dataset is now more robust, likely improving model stability and preventing skewed predictions caused by extreme outliers.
# Verify outliers after capping
for col in top_outlier_features:
plt.figure(figsize=(6, 4))
sns.boxplot(x=df_outlier_handled[col], color='skyblue')
plt.title(f"Boxplot after Outlier Capping: {col}")
plt.show()
# ==========================================
# Step 1: Check capped values
# ==========================================
# Display summary statistics for the top outlier features
top_outlier_features = ['V34', 'V18', 'V15', 'V33', 'V29', 'V35', 'V24', 'V13', 'V17', 'V7']
print("Summary statistics after capping:\n")
print(train_df[top_outlier_features].describe().T)
# ==========================================
# Step 2: Boxplot to visualize capped values
# ==========================================
plt.figure(figsize=(12, 6))
sns.boxplot(data=train_df[top_outlier_features])
plt.title("Boxplot of Top Features After Outlier Capping")
plt.xticks(rotation=45)
plt.show()
# ==========================================
# Step 3: Identify remaining extreme values
# ==========================================
for col in top_outlier_features:
Q1 = train_df[col].quantile(0.25)
Q3 = train_df[col].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
remaining_outliers = train_df[(train_df[col] < lower_bound) | (train_df[col] > upper_bound)]
print(f"\nFeature: {col}")
print(f"Remaining outliers count: {len(remaining_outliers)}")
if len(remaining_outliers) > 0:
print("Sample extreme values:")
print(remaining_outliers[col].head())
The dataset for the top features has become more robust, but some extreme values still remain. Each feature retains a small number of values beyond the capped bounds; for example, V34 still has 803 outliers, and V18 has 731. The means of these features are generally close to zero, and the standard deviations range roughly between 1.76 (V7) and 3.91 (V24), showing that variability has been moderated but not entirely eliminated.
The remaining extreme values (both positive and negative) suggest that while the capping process reduced the influence of the most severe outliers, a few extreme points persist and may still affect model performance. Features like V24, V33, and V17 show the highest remaining extreme values, indicating they may require additional attention, such as further capping or robust scaling, depending on the sensitivity of the neural network model. Overall, this step improves data stability, reduces skew, and prepares the features for more reliable learning by the model.
a. We skip feature engineering for now because:
Correlations are weak, no strong signals for interactions.
Neural networks are powerful enough to learn complex patterns automatically.
Avoid unnecessary complexity at this stage.
b. Focus on clean preprocessing, class imbalance handling, and model tuning first.
c. Revisit feature engineering only if performance remains poor.
# median imputation (fit on TRAIN)
imputer = SimpleImputer(strategy="median")
X_imputed = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)
# apply to test
X_test_imputed = pd.DataFrame(imputer.transform(X_test), columns=X_test.columns)
# split train/val
X_train, X_val, y_train, y_val = train_test_split(
X_imputed, y, test_size=0.2, stratify=y, random_state=42
)
# scale (fit scaler on training only)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test_imputed)
# Check class balance
print("\nClass distribution in training set:")
print(y_train.value_counts())
print("Training Data Shape:", X_train_scaled.shape)
print("Testing Data Shape:", X_test_scaled.shape)
classes = np.unique(y_train)
cw = compute_class_weight(class_weight="balanced", classes=classes, y=y_train)
class_weight_dict = dict(zip(classes, cw))
print("Computed class weights (train):", class_weight_dict)
# We'll use class_weight in some experiments
Our current pipeline uses class weights, so no resampling is needed.
Class 0 weight = 0.53 → errors on the majority class are less penalized.
Class 1 weight = 9.01 → errors on the minority class are heavily penalized.
This adjustment helps the model focus more on the minority class, without changing the underlying data and improves its ability to correctly predict class 1 despite the imbalanced dataset.
# =======================
# Decision Tree Model
# =======================
dt_model = DecisionTreeClassifier(
random_state=42,
class_weight="balanced", # helps with imbalanced classes
max_depth=None, # start without pre-pruning
)
# Train
dt_model.fit(X_train_scaled, y_train)
# Predictions
y_val_pred = dt_model.predict(X_val_scaled)
y_val_proba = dt_model.predict_proba(X_val_scaled)[:, 1]
# Evaluation
print("Decision Tree Performance on Validation Set:")
print(classification_report(y_val, y_val_pred))
# Confusion matrix
cm = confusion_matrix(y_val, y_val_pred)
print("Confusion Matrix:\n", cm)
# ROC-AUC Score
auc = roc_auc_score(y_val, y_val_proba)
print(f"ROC-AUC Score: {auc:.4f}")
# ROC Curve
RocCurveDisplay.from_estimator(dt_model, X_val_scaled, y_val)
plt.title("Decision Tree ROC Curve")
plt.show()
results = []
# Save results for comparison
results.append({
'Model': 'Decision Tree',
'AUC': auc,
'Precision': classification_report(y_val, y_val_pred, output_dict=True)['1']['precision'],
'Recall': classification_report(y_val, y_val_pred, output_dict=True)['1']['recall'],
'F1-score': classification_report(y_val, y_val_pred, output_dict=True)['1']['f1-score'],
'Accuracy': classification_report(y_val, y_val_pred, output_dict=True)['accuracy'],
'trained_model': dt_model
})
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns)
plt.figure(figsize=(20,10))
plot_tree(
dt_model,
feature_names=X_train_scaled_df.columns,
class_names=['0','1'],
filled=True,
rounded=True,
fontsize=10,
max_depth=3
)
plt.show()
# ------------------------
# Pre-pruned Decision Tree
# ------------------------
pre_dt = DecisionTreeClassifier(
random_state=42,
class_weight='balanced',
max_depth=5, # limit depth
min_samples_split=20, # minimum samples to split
min_samples_leaf=10 # minimum samples per leaf
)
# Train
pre_dt.fit(X_train_scaled, y_train)
# Predictions
y_val_pred = pre_dt.predict(X_val_scaled)
y_val_proba = pre_dt.predict_proba(X_val_scaled)[:,1]
# Evaluation
print("Pre-pruned Decision Tree Performance:")
print(classification_report(y_val, y_val_pred))
# Confusion matrix
cm = confusion_matrix(y_val, y_val_pred)
print("Confusion Matrix:\n", cm)
# ROC-AUC
auc = roc_auc_score(y_val, y_val_proba)
print(f"ROC-AUC Score: {auc:.4f}")
# ROC Curve
RocCurveDisplay.from_estimator(pre_dt, X_val_scaled, y_val)
plt.title("Pre-pruned Decision Tree ROC Curve")
plt.show()
# Tree plot
plt.figure(figsize=(20,10))
plot_tree(pre_dt,
feature_names=X_train.columns,
class_names=['0','1'],
filled=True, rounded=True, fontsize=10)
plt.show()
# ------------------------
# Post-pruned Decision Tree
# ------------------------
# Step 1: Get effective alphas for pruning
path = dt_model.cost_complexity_pruning_path(X_train_scaled, y_train)
ccp_alphas = path.ccp_alphas
# Step 2: Train multiple trees with different alphas
post_pruned_trees = []
for alpha in ccp_alphas:
tree = DecisionTreeClassifier(
random_state=42,
class_weight='balanced',
ccp_alpha=alpha
)
tree.fit(X_train_scaled, y_train)
post_pruned_trees.append(tree)
# Step 3: Choose best alpha using validation AUC
best_auc = 0
best_tree = None
for tree in post_pruned_trees:
y_val_proba = tree.predict_proba(X_val_scaled)[:,1]
auc = roc_auc_score(y_val, y_val_proba)
if auc > best_auc:
best_auc = auc
best_tree = tree
print(f"Best Post-pruned Tree ROC-AUC: {best_auc:.4f}")
# Plot best post-pruned tree
plt.figure(figsize=(20,10))
plot_tree(best_tree,
feature_names=X_train.columns,
class_names=['0','1'],
filled=True, rounded=True, fontsize=10)
plt.show()
# ROC Curve for best post-pruned tree
RocCurveDisplay.from_estimator(best_tree, X_val_scaled, y_val)
plt.title("Best Post-pruned Decision Tree ROC Curve")
plt.show()
The Decision Tree model achieves a high overall accuracy of 97% on the validation set. However, due to the severe class imbalance, this metric is misleading: the model performs very well on the majority class (0) with precision 0.98 and recall 0.98, but comparatively worse on the minority class (1) with precision 0.73 and recall 0.69.
The confusion matrix shows that out of 222 actual 1 instances, 68 were misclassified as 0, while only 57 majority class instances were misclassified as 1. The ROC-AUC score of 0.839 indicates that the model has reasonable discriminative ability between classes, but there is still room for improvement in detecting the minority class.
Key insight: The model is biased toward the majority class despite using class_weight="balanced". Further techniques like SMOTE oversampling, hyperparameter tuning, or ensemble methods could help improve recall and F1-score for the minority class.
Pre-pruning improves minority class recall significantly at the cost of precision, reducing overfitting.
Post-pruning further improves overall discriminative ability (ROC-AUC) by removing unnecessary splits from the fully grown tree.
Fully grown trees are highly accurate for the majority class but tend to underperform on minority classes.
For imbalanced datasets like ours, pre- or post-pruning is crucial to improve minority class detection and generalization.
# ============================
# MODEL BUILDING - MODEL 0
# ============================
# MODEL EVALUATION CRITERION & RATIONALE
# ----------------------------
# Reproducibility
np.random.seed(42)
tf.random.set_seed(42)
# Safety checks: ensure required variables exist
required_vars = ['X_train_scaled','X_val_scaled','y_train','y_val']
for v in required_vars:
if v not in globals():
raise NameError(f"Required variable '{v}' not found. Make sure preprocessing produced {required_vars}.")
# If class weights not present, compute them now
if 'class_weight_dict' not in globals():
from sklearn.utils.class_weight import compute_class_weight
classes = np.unique(y_train)
cw = compute_class_weight(class_weight='balanced', classes=classes, y=y_train)
class_weight_dict = dict(zip(classes, cw))
print("Computed class weights:", class_weight_dict)
else:
print("Using existing class_weight_dict:", class_weight_dict)
# Create an output directory
OUT_DIR = "/mnt/data/rene_model_outputs"
os.makedirs(OUT_DIR, exist_ok=True)
# ----------------------------
# Model 0 definition
# # 1 hidden layer
# # ReLU activation
# # SGD optimizer
# ----------------------------
tf.keras.backend.clear_session()
input_dim = X_train_scaled.shape[1]
model_0 = models.Sequential([
layers.Input(shape=(input_dim,)),
layers.Dense(32, activation='relu', name='hidden_1'), # 1 hidden layer, ReLU
layers.Dense(1, activation='sigmoid', name='output') # binary output
])
# Use SGD optimizer as requested
sgd_opt = optimizers.SGD(learning_rate=0.01)
# Compile with metrics we want to monitor
model_0.compile(
optimizer=sgd_opt,
loss='binary_crossentropy',
metrics=[
'accuracy',
tf.keras.metrics.AUC(name='auc'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
)
print("\nModel 0 summary:")
model_0.summary()
# ----------------------------
# Callbacks
# ----------------------------
es = callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
mc = callbacks.ModelCheckpoint(os.path.join(OUT_DIR, "model0_best.keras"), save_best_only=True, monitor='val_loss')
# Optional: small custom callback to print validation ROC-AUC each epoch
class RocAucCallback(callbacks.Callback):
def __init__(self, validation_data):
super().__init__()
self.validation_data = validation_data
def on_epoch_end(self, epoch, logs=None):
X_val, y_val_local = self.validation_data
y_pred = self.model.predict(X_val, verbose=0)
auc_score = roc_auc_score(y_val_local, y_pred)
print(f"Epoch {epoch+1:03d} - val_ROC_AUC: {auc_score:.4f}")
roc_callback = RocAucCallback(validation_data=(X_val_scaled, y_val))
# ----------------------------
# Train the model
# ----------------------------
history_0 = model_0.fit(
X_train_scaled, y_train,
validation_data=(X_val_scaled, y_val),
epochs=100,
batch_size=256,
class_weight=class_weight_dict,
callbacks=[es, mc, roc_callback],
verbose=2
)
# Save final model and preprocessing objects
model_0.save(os.path.join(OUT_DIR, "model0_final.keras"))
# If you used scaler/imputer, save them too (replace 'scaler' and 'imputer' names if different)
if 'scaler' in globals():
joblib.dump(scaler, os.path.join(OUT_DIR, "scaler.joblib"))
if 'imputer' in globals():
joblib.dump(imputer, os.path.join(OUT_DIR, "imputer.joblib"))
joblib.dump(class_weight_dict, os.path.join(OUT_DIR, "class_weight_dict.joblib"))
print(f"\nModel and preprocessors saved to {OUT_DIR}")
# ----------------------------
# Evaluation on Validation set
# ----------------------------
val_probs = model_0.predict(X_val_scaled).ravel()
val_preds = (val_probs >= 0.5).astype(int)
# Metrics
val_auc = roc_auc_score(y_val, val_probs)
val_precision = precision_score(y_val, val_preds, zero_division=0)
val_recall = recall_score(y_val, val_preds, zero_division=0)
print("\nValidation Metrics:")
print(f"ROC-AUC: {val_auc:.4f}")
print(f"Precision: {val_precision:.4f}")
print(f"Recall: {val_recall:.4f}")
print("\nClassification Report:\n")
print(classification_report(y_val, val_preds, digits=4))
# Confusion matrix (plot)
cm = confusion_matrix(y_val, val_preds)
plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Failure','Failure'], yticklabels=['No Failure','Failure'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Validation Confusion Matrix - Model 0')
plt.show()
# ROC Curve
fpr, tpr, _ = roc_curve(y_val, val_probs)
plt.figure(figsize=(6,5))
plt.plot(fpr, tpr, label=f"ROC curve (AUC = {val_auc:.3f})")
plt.plot([0,1], [0,1], '--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve - Validation (Model 0)')
plt.legend(loc='lower right')
plt.show()
# Plot training history (loss, accuracy, auc)
plt.figure(figsize=(14,10))
plt.subplot(2,2,1)
plt.plot(history_0.history['loss'], label='train_loss')
plt.plot(history_0.history['val_loss'], label='val_loss')
plt.title('Loss')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.legend()
plt.subplot(2,2,2)
plt.plot(history_0.history['accuracy'], label='train_acc')
plt.plot(history_0.history['val_accuracy'], label='val_acc')
plt.title('Accuracy')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.legend()
plt.subplot(2,2,3)
plt.plot(history_0.history['auc'], label='train_auc')
plt.plot(history_0.history['val_auc'], label='val_auc')
plt.title('AUC')
plt.xlabel('Epoch'); plt.ylabel('AUC'); plt.legend()
plt.subplot(2,2,4)
plt.plot(history_0.history.get('precision', []), label='train_prec')
plt.plot(history_0.history.get('val_precision', []), label='val_prec')
plt.plot(history_0.history.get('recall', []), label='train_rec')
plt.plot(history_0.history.get('val_recall', []), label='val_rec')
plt.title('Precision / Recall')
plt.xlabel('Epoch'); plt.ylabel('Score'); plt.legend()
plt.tight_layout()
plt.show()
# ----------------------------
# Optional: Evaluate on test set if labels exist
# ----------------------------
if 'y_test' in globals() and y_test is not None:
test_probs = model_0.predict(X_test_scaled).ravel()
test_preds = (test_probs >= 0.5).astype(int)
test_auc = roc_auc_score(y_test, test_probs)
print("\nTest Metrics:")
print(f"Test ROC-AUC: {test_auc:.4f}")
print(classification_report(y_test, test_preds, digits=4))
cm_test = confusion_matrix(y_test, test_preds)
plt.figure(figsize=(5,4))
sns.heatmap(cm_test, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix - Test")
plt.show()
else:
print("\nNo test labels available; predictions can be made on X_test_scaled when labels exist.")
# ----------------------------
# Business cost calc example
# ----------------------------
# Assign example costs
replacement_cost = 100.0 # cost for FN
repair_cost = 10.0 # cost for TP
inspection_cost = 1.0 # cost for FP
tn, fp, fn, tp = cm.ravel()
total_cost = fn*replacement_cost + tp*repair_cost + fp*inspection_cost
avg_cost_per_case = total_cost / (tn+fp+fn+tp)
print(f"\nValidation business-cost example: total_cost={total_cost:.2f}, avg_cost_per_case={avg_cost_per_case:.4f}")
print(os.path.exists(OUT_DIR)) # Should return True
# Checking the files in the directory
files = os.listdir(OUT_DIR)
print("Files in output directory:", files)
Model 0 is a simple neural network with a single hidden layer of 32 neurons and a total of 1,345 trainable parameters, making it very lightweight and fast to train.
Confusion Matrix:
The classification report shows a weighted F1-score of 0.9691, indicating a balanced performance with a focus on minimizing missed failures.
Confusion Matrix:
On unseen data, the model successfully detected 241 failures, maintaining a high recall of 0.8546.
41 failures were missed, slightly higher than in the validation set, showing a small drop in generalization performance but still strong reliability.
167 normal cases were misclassified as failures, consistent with validation, showing stable behavior.
The costs are slightly elevated due to the false positives, but this trade-off is acceptable because minimizing missed failures (false negatives) is far more critical in a safety-sensitive environment like turbine operations.
Model 0 serves as a strong baseline, prioritizing failure detection and safety through its very high recall. However, the moderate precision suggests there is room to reduce false positives. This could be addressed in future models by:
Overall, this model is highly reliable for initial deployment, focusing on catching nearly all failures, with the next phase aimed at improving precision and lowering maintenance costs without compromising safety.
# ===============================
# Neural Network Models Performance Improvement
# ===============================
from tensorflow.keras import models, layers, optimizers, regularizers, callbacks
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, roc_curve
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# -----------------------------------------------------------------
# Helper Function to Build Neural Network Models
# -----------------------------------------------------------------
def build_model(model_number, input_dim):
model = models.Sequential()
if model_number == 1:
model.add(layers.Dense(32, activation='relu', input_dim=input_dim))
model.add(layers.Dense(1, activation='sigmoid'))
elif model_number == 2:
model.add(layers.Dense(32, activation='relu', input_dim=input_dim))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
elif model_number == 3:
model.add(layers.Dense(32, activation='relu', input_dim=input_dim))
model.add(layers.Dense(1, activation='sigmoid'))
elif model_number == 4:
model.add(layers.Dense(64, activation='relu', input_dim=input_dim))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
elif model_number == 5:
model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01), input_dim=input_dim))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
elif model_number == 6:
model.add(layers.Dense(32, activation='relu', input_dim=input_dim))
model.add(layers.Dense(1, activation='sigmoid'))
return model
# -----------------------------------------------------------------
# Train and Evaluate Each Model
# -----------------------------------------------------------------
def train_and_evaluate(model_number, optimizer, X_train, y_train, X_val, y_val, class_weight=None):
model = build_model(model_number, input_dim=X_train.shape[1])
# Optimizer logic
if model_number == 3:
optimizer = optimizers.Adam(learning_rate=0.001)
elif optimizer is None:
optimizer = optimizers.SGD(learning_rate=0.01)
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
# Callbacks
es = callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
# Training
history = model.fit(
X_train, y_train,
epochs=30,
batch_size=32,
validation_data=(X_val, y_val),
verbose=0,
class_weight=class_weight,
callbacks=[es]
)
# Predictions
y_pred_proba = model.predict(X_val)
y_pred = (y_pred_proba > 0.5).astype(int)
# Metrics
metrics_dict = {
"Model": f"Model {model_number}",
"Accuracy": accuracy_score(y_val, y_pred),
"Precision": precision_score(y_val, y_pred, zero_division=0),
"Recall": recall_score(y_val, y_pred, zero_division=0),
"F1-Score": f1_score(y_val, y_pred),
"ROC-AUC": roc_auc_score(y_val, y_pred_proba)
}
return model, history, metrics_dict, y_pred_proba, y_pred
# -----------------------------------------------------------------
# SMOTE for Model 6
# -----------------------------------------------------------------
def apply_smote(X_train, y_train):
smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X_train, y_train)
print("SMOTE applied. Class distribution after balancing:")
print(pd.Series(y_res).value_counts())
return X_res, y_res
# -----------------------------------------------------------------
# Plot Functions
# -----------------------------------------------------------------
def plot_history(history, model_num):
# Accuracy
plt.figure(figsize=(8, 4))
plt.plot(history.history['accuracy'], label='Training Accuracy', linestyle='dashed')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title(f'Model {model_num} - Accuracy Over Epochs')
plt.xlabel('Epochs'); plt.ylabel('Accuracy'); plt.legend()
plt.show()
# Loss
plt.figure(figsize=(8, 4))
plt.plot(history.history['loss'], label='Training Loss', linestyle='dashed', color='red')
plt.plot(history.history['val_loss'], label='Validation Loss', color='orange')
plt.title(f'Model {model_num} - Loss Over Epochs')
plt.xlabel('Epochs'); plt.ylabel('Loss'); plt.legend()
plt.show()
def plot_roc(y_val, y_pred_proba, model_num):
fpr, tpr, _ = roc_curve(y_val, y_pred_proba)
plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, label=f"Model {model_num}")
plt.plot([0,1], [0,1], 'k--')
plt.title(f"Model {model_num} - ROC Curve")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()
def plot_confusion_matrix(y_val, y_pred, model_num):
cm = confusion_matrix(y_val, y_pred)
plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Class 0', 'Class 1'], yticklabels=['Class 0', 'Class 1'])
plt.title(f'Model {model_num} - Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
# -----------------------------------------------------------------
# Run All Six Models
# -----------------------------------------------------------------
all_results = []
for model_num in range(1, 7):
print(f"\n--- Training Model {model_num} ---")
if model_num == 6:
# Apply SMOTE for Model 6
X_input, y_input = apply_smote(X_train_scaled, y_train)
cw = None # don't use class weights with SMOTE
else:
X_input, y_input = X_train_scaled, y_train
cw = class_weight_dict
optimizer = optimizers.SGD(learning_rate=0.01) if model_num in [1,2,4,5,6] else None
model, history, metrics_dict, y_pred_proba, y_pred = train_and_evaluate(
model_number=model_num,
optimizer=optimizer,
X_train=X_input,
y_train=y_input,
X_val=X_val_scaled,
y_val=y_val,
class_weight=cw
)
all_results.append(metrics_dict)
# Individual plots
plot_history(history, model_num)
plot_roc(y_val, y_pred_proba, model_num)
plot_confusion_matrix(y_val, y_pred, model_num)
# -----------------------------------------------------------------
# Final Comparison Table and Bar Plot
# -----------------------------------------------------------------
results_df = pd.DataFrame(all_results)
print("\n=== Final Metrics for Models 1-6 ===")
display(results_df)
# Identify best model by ROC-AUC
best_model_row = results_df.loc[results_df['ROC-AUC'].idxmax()]
print(f"\nBest Model: {best_model_row['Model']} with ROC-AUC = {best_model_row['ROC-AUC']:.4f}")
# Bar plot comparison
results_df.set_index("Model", inplace=True)
results_df.plot(kind="bar", figsize=(12,6))
plt.title("Comparison of Model Performance Metrics")
plt.ylabel("Score")
plt.xticks(rotation=0)
plt.grid(True)
plt.show()
Now, in order to select the final model, we will compare the performances of all the models for the training and validation sets.
# Final Comparison Table for Neural Network Models
results_df = pd.DataFrame(all_results)
print("\n=== Model Performance Comparison (Models 1-6) ===")
display(results_df.sort_values(by='ROC-AUC', ascending=False))
# Plot comparison
plt.figure(figsize=(10,6))
for metric in ['ROC-AUC', 'Precision', 'Recall', 'F1-Score', 'Accuracy']:
plt.plot(results_df['Model'], results_df[metric], marker='o', label=metric)
plt.title("Neural Network Model Performance Metrics")
plt.xlabel("Model")
plt.ylabel("Score")
plt.legend()
plt.grid(True)
plt.show()
# Identify best model
best_model_row = results_df.loc[results_df['ROC-AUC'].idxmax()]
print(f"\nBest Neural Network Model: {best_model_row['Model']} with ROC-AUC = {best_model_row['ROC-AUC']:.4f}")
print("DEBUG: Neural Network Results List Content\n", all_results)
Model 2 achieved the highest ROC-AUC (0.963) and F1-Score (0.8), making it the most balanced model overall.
Model 6 performed very closely, with a slightly lower ROC-AUC (0.95) but strong Precision (0.73).
Precision varied more widely (from 0.575 to 0.708), reflecting a trade-off:
# Function to build a model for tuning
def build_model(hp):
model = keras.Sequential()
# Input layer
model.add(layers.Input(shape=(X_train_scaled.shape[1],)))
# Tune number of hidden layers
for i in range(hp.Int('num_layers', 1, 3)): # 1 to 3 layers
model.add(layers.Dense(
units=hp.Int(f'units_{i}', min_value=32, max_value=256, step=32),
activation=hp.Choice('activation', ['relu', 'tanh'])
))
# Optional dropout tuning
model.add(layers.Dropout(hp.Float('dropout_rate', 0.0, 0.5, step=0.1)))
# Output layer
model.add(layers.Dense(1, activation='sigmoid'))
# Tune optimizer learning rate
optimizer = keras.optimizers.Adam(
learning_rate=hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')
)
model.compile(
optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy', keras.metrics.AUC(name='AUC')]
)
return model
# Initialize tuner
tuner = kt.RandomSearch(
build_model,
objective='val_AUC',
max_trials=10, # Number of different hyperparameter combinations
executions_per_trial=2,
directory='tuner_results',
project_name='renewind_nn_tuning'
)
# Run tuning
tuner.search(
X_train_scaled, y_train,
validation_data=(X_val_scaled, y_val),
epochs=20,
batch_size=32,
class_weight=class_weight_dict # if handling imbalance with class weights
)
# Best hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best Hyperparameters:")
print(f"Number of layers: {best_hps.get('num_layers')}")
for i in range(best_hps.get('num_layers')):
print(f"Units in layer {i+1}: {best_hps.get(f'units_{i}')}")
print(f"Dropout rate: {best_hps.get('dropout_rate')}")
print(f"Learning rate: {best_hps.get('learning_rate')}")
The hyperparameter tuning process successfully identified an optimized neural network configuration with a best validation ROC-AUC of 0.9722, which is higher than the best pre-tuning ROC-AUC (0.9630) from Model 2. This indicates a notable improvement in the model's ability to distinguish between failure and non-failure cases.
Performance Improvement:
Optimal Architecture Identified:
Training Progression:
The tuning process enhanced the model’s discriminatory capability, moving from ROC-AUC = 0.9630 to 0.9722. The final optimized network, with three layers and strong initial feature extraction (256 neurons), combined with moderate regularization and a finely tuned learning rate, is now better suited for production deployment, as it balances accuracy, generalization, and stability.
# Build and train the final model with best hyperparameters
final_model = tuner.hypermodel.build(best_hps)
history = final_model.fit(
X_train_scaled, y_train,
validation_data=(X_val_scaled, y_val),
epochs=50,
batch_size=32,
class_weight=class_weight_dict
)
# Evaluate on test set
test_loss, test_accuracy, test_auc = final_model.evaluate(X_test_scaled, y_test)
print(f"\nFinal Test Accuracy: {test_accuracy:.4f}")
print(f"Final Test AUC: {test_auc:.4f}")
Now, let's check the performance of the final model on the test set.
print(results_df.columns)
print(best_model_row)
# ============================================
# 1. Identify Best Model Name Automatically
# ============================================
# Assuming your previous results dataframe is named `results_df`
# and has columns like: ['Model', 'Accuracy', 'Precision', 'Recall', 'F1-Score', 'ROC-AUC']
best_model_row = results_df.loc[results_df['ROC-AUC'].idxmax()] # Get row with max ROC-AUC
best_model_name = best_model_row['Model'] # e.g., "Model 2"
print(f"\nBest model selected automatically: {best_model_name}")
# ============================================
# 2. Evaluate the Final Tuned Model
# ============================================
print(f"\nEvaluating {best_model_name} on Test Set...")
# Predict probabilities and classes
y_test_pred_proba = final_model.predict(X_test_scaled)
y_test_pred = (y_test_pred_proba > 0.5).astype(int)
# Metrics
test_auc = roc_auc_score(y_test, y_test_pred_proba)
test_accuracy = accuracy_score(y_test, y_test_pred)
test_precision = precision_score(y_test, y_test_pred, zero_division=0)
test_recall = recall_score(y_test, y_test_pred, zero_division=0)
test_f1 = f1_score(y_test, y_test_pred)
# Display metrics
print(f"\n=== Final Test Set Performance ===")
print(f"AUC: {test_auc:.4f}")
print(f"Accuracy: {test_accuracy:.4f}")
print(f"Precision: {test_precision:.4f}")
print(f"Recall: {test_recall:.4f}")
print(f"F1-score: {test_f1:.4f}")
# ============================================
# 3. Confusion Matrix
# ============================================
cm = confusion_matrix(y_test, y_test_pred)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=[0,1], yticklabels=[0,1])
plt.title(f'Confusion Matrix - {best_model_name}')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
# ============================================
# 4. ROC Curve
# ============================================
fpr, tpr, _ = roc_curve(y_test, y_test_pred_proba)
plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, label=f'{best_model_name} (AUC = {test_auc:.4f})', color='blue')
plt.plot([0,1], [0,1], 'k--', label='Random Guess')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title(f"ROC Curve - {best_model_name}")
plt.legend()
plt.grid(True)
plt.show()
# ============================================
# 5. Summary Table
# ============================================
summary_df = pd.DataFrame({
"Model": [best_model_name],
"AUC": [test_auc],
"Accuracy": [test_accuracy],
"Precision": [test_precision],
"Recall": [test_recall],
"F1-score": [test_f1]
})
print("\nFinal Test Set Performance Summary:")
display(summary_df)
Model Selection & Architecture
Test Set Performance
Minority class (1) performance:
Key Insights
Write down some insights and business recommendations based on your observations.
Minority class detection (Class 1) has improved significantly after tuning:
Deploy the Tuned Model in Production
Incorporate Business Cost Considerations
Periodic Retraining
Use Insights for Resource Optimization
Future Enhancements